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Abstract. We present a generalisation of King's symbolic execution 
technique called compact symbolic execution. It is based on a concept 
of templates: a template is a declarative parametric description of such 
a program part, generating paths in symbolic execution tree with regu- 
larities in program states along them. Typical sources of these paths are 
program loops and recursive calls. Using the templates we fold the cor- 
responding paths into single vertices and therefore considerably reduce 
size of the tree without loss of any information. There are even programs 
for which compact symbolic execution trees are finite even though the 
classic symbolic execution trees are infinite. 



1 Introduction 

Classic symbolic execution as proposed by King in 1976 [5] systematically ex- 
plores all real paths in an analysed program. There is typically huge (or even 
infinite) number of real paths even for very small and simple programs. There- 
fore, exploration of the real paths becomes a serious problem. We speak about 
the path explosion problem. 

Compact symbolic execution also explores all real program paths, but in a 
very compact manner. We analyse a given program before we start its symbolic 
execution. We look for those parts of the program, which might produce real 
paths with some regularities in program states along them. Typically, program 
loops and recursion produces these regularities. We analyse the program parts 
independently from the remainder of the program. If the analysis of a part suc- 
ceeds, then a result is a template, i.e. a declarative parametric description of the 
complete behaviour of the analysed part. Therefore, an output from the program 
analysis is a set of templates. Now we can execute the program symbolically with 
the templates. Until we reach some of the successfully analysed program parts, 
we proceed just like in classic symbolic execution. Let us now suppose we have 
just reached such a part. Having a template for the part, we do not need to sym- 
bolically execute interior of the part. We just instantiate the template into the 
end of the current path and then we jump behind the part, where we continue 
with classic symbolic execution again. 

The worst case for compact symbolic execution is, when we fail to compute 
any template for a given program. Compact symbolic execution then reduces to 
classic one, and we gain no space savings. 



2 Overview 

In this section we give an intuition of compact symbolic execution. For simplic- 
ity of presentation we use the following definition of a program. Although our 
programs are simple they support typical imperative constructs. 

Definition 1 (Program) A program is a collection of functions and global 
variables. Each function has its own local variables. All program variables and 
functions have different names. Exactly one function is marked as starting one. 
Each function is represented as an oriented graph. Vertices in the graph identify 
program locations, while edges define transitions between them. We distinguish a 
single entry and exit location in each graph. There is no in-edge to entry location 
and there is no out-edge from the exit one. We label edges by actions to be taken 
when moving between connected locations. An action can be 

(1) An assignment of the form <variable>:=<expression>, 

(2) Call by value statements 

(a) <variable>:=<function-name>(<arg-list>) , or 

(b) <function-name> (<arg-list>) 

(3) A return value statement ret, <expression> , 

(4) skip statement, which does nothing, or 

(5) A boolean expression over program variables. 

If an edge e = (u,v) is labelled by one of the actions (l)-(4), then out-degree of u 
is 1. Otherwise, label of e is an action (5), out-degree of u is 2 and its out-edges 
are labelled by boolean expressions 7 and -17. No action (2) can reference the 
starting function. And for simplicity we do not consider pointer arithmetic nor 
heap allocations. We prevent invalid operations in actions (like division by zero, 
etc.) by branchings into error locations. An error location is any location with a 
single out- edge heading back to that location and it is labelled with skip action. 

We can see an example of a program at Figure [1] (a) . The depicted function 
linSrch returns the least index i into the array A such that A [i] ==x. If x is not 
in A at all, then it returns -1. 

We first briefly describe classic symbolic execution as proposed by King [§]. 
Instead of passing concrete data into parameters of the starting function, we pass 
symbols from a set {ao, a\, . . .}. Let us suppose we pass symbols a>o and a\ to 
variables a and b respectively. After executing an action c : =2*a+b the variable 
c will contain a symbolic expression 2olq + ot\ as its value. Symbolic memory is a 
function 9 from program variables to a set of symbolic expressions. We further 
maintain a boolean symbolic expression Lp called path condition. It represent a 
complete identifier of a particular program path taken during an execution, ip is 
initially true and it can be updated at program branchings. Let 9 be a symbolic 
memory having 0(a) = a , 9(b) = a± and 9(c) = 2ao + ct\ and let c-a>2*b and 
c-a<=2*b be actions of out-edges of an branching location. For the first action we 
proceed as follows. We evaluate the action in 9. The result is a boolean symbolic 
expression ag + ot\ > 2ct\. If 9? — ^ (ao + a± > 2ai) is satisfiable, we update 
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(f to ip A (ptQ + cti > 2ai) and we continue the execution by crossing the edge 
having the action. Then we proceed similarly for the second action. Note that 
if both implications are satisfiable, we fork the execution into two parallel and 
independent executions. Besides a symbolic memory and a path condition we 
commonly have a call stack E and we also need to identify a current program 
location I. Putting all the things together we get a program state represented by 
a tuple s — (9, ip, E, I). Note that we understand a call stack record as pairs (a, I), 
where I is a return location and a is a restriction of a symbolic memory to local 
variables. Further, we commonly describe the symbolic execution of a program 
by a tree structure called symbolic execution tree. Vertices of the tree are related 
to program locations visited during the execution and edges reflect transitions 
between the locations. Each vertex of the tree is labelled by a related program 
state. But instead of labels T and F for branching edges (as proposed by King), 
we label them by evaluated actions of the branching edges. Figure[T](b) depicts a 
part of symbolic execution tree of the example program from Figure [T] (a) (with 
omitted program states labelling the vertices) . Please ignore grey regions in the 
tree for now. We assume that classic symbolic execution of the program started 
with an initial symbolic memory 9 = {(i, a^), (n, ai), (x, a 2 ), (A, 03)}- 

We often use the following dot-notation to access elements of tuples. If s = 
(9, ip, E, I) is a program state, then s.9 denotes its symbolic memory, s.tp denotes 
its path condition, s.E is its call stack and s.l is a current program location. 
Further, if u is a vertex of symbolic execution tree, then u.s denotes program 
state labelling the vertex. And instead of u.s.9, u.s.tp, u.s.E and u.s.l we simply 
write u.9, u.tp, u.E and u.l. Finally, if E is a call stack then we use dot-notation 
to access record at the top of the call stack. So, for example E.l denotes return 
location of record at the top of E. 

Symbols {ao,a%, . . .} in classic symbolic execution represent input values 
to whole program. We generalise this concept to allow independent symbolic 
execution of parts of an analysed program independently to the remainder. Each 
such a part uses the symbols {ao, ai, . . .} relative to a chosen entry location to 
the part. Then using a composition of program states (defined later) we can 
express any run of classic symbolic execution as a composition of program states 
resulting from analyses of the parts. Let s = (9, ip, E, I) be a program state 
resulting from a symbolic execution from a program location l (e.g. the entry 
location of the starting function), up to an entry location I of an independently 
analysed program part. Let s' = (9' , ip' , E' , V) be a program state resulting from 
the analysis of the part, i.e. s' represents a symbolic execution from the entry 
location / to some exit location V from the part. Then s o s' = (9 o 9',(p A 
9((p'),Eo(9oE'),l') is composed program state representing symbolic execution 
from lo to I' through the analysed part (entered in location I). We can see that 
composition of program states is implemented as composition of their individual 
components. We discuss very details of these operations in Section [3] Only note 
that composed path condition is ip A 9(ip') rather then ip A <p'. This is because 
ip' may contain some symbols. But they are related to the entry location I of 
the analysed part and not to the location l - Therefore, we have to compose ip' 
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with 9 first to express ip' in terms of symbols relative to location Iq. We do the 
similar effect of shifting symbols from location I to Iq in the compositions 9 o 9' 
and 9 o EE' . 



Fig. 1. (a) A program with a function linSrch(A,n,x) . (b) Symbolic execution tree 
of function linSrch. (c) Compact symbolic execution tree of function linSrch. 



In the symbolic execution tree at Figure Q] (b) there is a single path high- 
lighted by a sequence of grey regions. Vertices in each region are related to the 
same sequence of program locations: b, c, d, b. Moreover, we enter the path at 
vertex referencing location b and we can leave the path either by stepping into a 
vertex referencing location e or into a vertex referencing location /. Let us denote 
the entry vertex into the path as bo and the exit vertices from the path referenc- 
ing locations e and / as eo, e\, . . ■ and /o, /i, • • • respectively being indexed from 
the top down. Our goal is to completely eliminate the path in grey from the tree, 
while still representing all real program paths. One way to do so is to represent 
whole the path by a single vertex, b say, with two direct successors. The first 
successor, e say, represents all the exit vertices e* from the path and the second, 
/ say, representing all the exits fi. Note that names of the vertices b, e and / also 
represent program locations they reference. We label the vertex b by the program 
state labelling 6 - But the question is what program states we should assign to 
the vertices e and /. Note that two different vertices a and ej may be labelled 
by different program states. So, for the vertex e we need to introduce a program 
state e.s[«], parametrised by a parameter k, such that each program state e^.s 
can be equivalently expressed by e.s[/tj, when k is substituted by some number 
v. Of course, for different states e^.s and ej.s there are different numbers, say Vi 
and Vj, for parameter substitution. We similarly need a parametrised program 
state /.sfl/c] for the vertex /. We compute the states e.s[/t] and /.sfl/c] before 
we start symbolic execution of the program from the Figure Q] (a) by analysing 




(a) 



(b) 



(c) 
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the following its part. The part consists of all the locations 6, c, d, e, / discussed 
above and of all the edges between them. Note that the sequence b, c, d, b of 
locations forms a cyclic path inside the analysed part. This cycle is actually the 
source of the path in grey regions. Nevertheless, wc want to describe program 
states at exits form the part. The exits from the part are target vertices of those 
edges of the part, which do not belong to the cycle. Therefore, locations e and 
/ are the exits from the part. We also identify the location b as entry location 
into the part, since we can enter the part by stepping into location b. The part is 
completely defined now. We analyse it independently from the remainder of the 
program. It mainly means that if we use some symbols a, in the analysis, then 
they are related to the entry location b of the part and not to the entry location 
of the whole program. At this point we are more concerned about formulation 
of a result from the analysis and its usage then the analysis itself. Therefore, 
we postpone its description to Section S) We assume here that key properties 
e.s[/cj and /.s[k] from the analysis are already computed, so we may formulate 
an output from the analysis of the part as the following template 

t=(6 ! 2,{(^^e,[],e)M,(^, W ,0,/)M}), 

where b is the entry location to the analysed part, the number 2 identifies number 
of following parametrised program states and the remaining two tuples are the 
parametrised program states e.s[«;] and /.s[kJ respectively. Note that [] identifies 
empty call stack. The template contains all the information we need to build 
compact symbolic execution tree, where the path in grey is folded as described 
above. 

Let us symbolically execute the program at Figure [T] (a) with the template 
t. We construct a compact symbolic execution tree during the execution. The 
tree is depicted at Figure Q] (c). We apply classic symbolic execution, until we 
reach the entry location t.b. Let b be the vertex in the tree, when we reach the 
location t.b and let s be the program state b.s. We now instantiate the template. 
Since we have exactly two program states in t, we create exactly two successor 
vertices e and / of the vertex b in the tree. The vertices e and / references lo- 
cations t.e and t.f respectively and they are further labelled by program states 
s o (t.0 e , t.ip e , [], f.e)[/t] and s o (t.Of, t.ipf, [], £./)[«] respectively. We 
finish the instantiation of t by creating edges (b, e) and (b, /) labelled by sym- 
bolic expressions s.9{t.ip e \n\) and s.Q{t.(pf\n\) respectively. The situation is also 
depicted at Figure [T] (c). Then we continue from both vertices e and / inde- 
pendently using classic symbolic execution again. These both executions reaches 
function exit location g in one step and compact symbolic execution terminates. 

Let us now have a look at Figure [2] (a) depicting a program with a function 
count If . The function counts number of elements in array A having values equal 
to x. We show the symbolic execution tree of the program at Figure [2] (b). There 
we can see several sequences of grey regions. According to our experience with the 
previous example we can easily detect that all that paths in grey are generated 
by a single program part consisting of locations c, d, e, /, g and edges between 
them. But there are two cyclic paths it = c,d,e,f,c and ir' = c,d,f,c inside 



5 




the part. Nevertheless, the grey regions highlight only the cycle ir. So, we ignore 
the cycle n' and tt is therefore the only cycle we consider. The remainder is now 
obvious. The locations / and g are exits from the part and c is the entry location 
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into the part. The analysis of the path (discussed later in Section [3} computes 
the following template 

t = (c, 2, { (e f , <p f , o , /) 14 , (0 g , Vg , o , g) 14 }) 

Compact symbolic execution with the template t computes compact symbolic 
execution tree depicted at Figure[2](c). The tree is basically a single link list. Note 
that we instantiate the template each time we reach the location c. But for each 
such instantiation we need a fresh parameter to prevent parameter collisions from 
previous instantiations. We assume we have infinitely many different names for 
the parameters. Therefore, expressions and program states at Figure [2] (c) are as 
follows: % = si.e(t.<p g lKil), 7} = sl.0(t.<p f l Kl }), a* = a\.6 ° (t .6 g , t.<p g , 0, 5 )M 
and 4 = 4.do(t.fl / ,t. V/ ,D, /)!«<]. 

The sequences of grey regions in the tree at Figured] (b) goes bottom left. 
But imagine they would go bottom right. Then each region would represent a 
sequence of program locations c, d, /, c. If we analysed closer these sequences of 
grey regions, we would realise that there is a part of the program from Fig- 
ure [2] (a) consisting of vertices c, d, /, e, g, where c, d, /, c is the only cycle in the 
part, c is the entry location into the part and locations e and g are exits from 
the part. If we further built a template from the part and run compact sym- 
bolic execution with it, we would also receive a compact symbolic execution tree 
forming basically a single linked list. 

To summarise, a general scheme for compact symbolic execution of the ex- 
amples above is as follows. We enumerate cyclic paths in a given program one by 
one. For each enumerated cycle we first complete related program part (deter- 
mining entry and exit locations) and then we compute a template for the part. 
Then we run compact symbolic execution with the computed templates. 

3 Definition 

In this section, we give precise definition of templates parametrised by a single 
parameter and we present compact symbolic execution algorithm using these 
templates. We start with basic terms valid for compact symbolic execution with 
any kinds of templates. We assume for the rest of this section that P is a program. 

An injective function from a set of all program variables of P to a set of 
symbols {ao-,cti,a2-, ■ ■ ■} is an initial symbolic memory of P. For each program 
variable a its symbol (9(a) represents some yet unknown value of that variable. 
So, 0(a) must belong to a domain of a (i.e. 0(a) is of a's type). Further, nu- 
meric symbolic expression is application of operators to numeric constants and 
symbols. Boolean symbolic expression is either an equality or inequality predi- 
cate over numeric symbolic expressions, or an application of logical connectives 
to other boolean symbolic expressions. Symbolic expression is either numeric or 
boolean symbolic expression. We have already given the definition of symbolic 
memory, call stack and program state in Section [5J But we in addition define 
for any program state s = (9, ip, S, I) that #(a) = 0(a), for each local variable a 
undefined at location Also note that is just a special symbolic memory. 
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The pseudo-code of Algorithm [TJ represents two algorithms. If we consider 
only unmarked lines, we get algorithm of classic symbolic execution. If we add 
lines marked with □ we get algorithm of compact symbolic execution with tem- 
plates with a single parameter. The lines marked by * are responsible for con- 
struction of symbolic execution tree. Obviously, both classic and compact sym- 
bolic executions can appear at both versions: with and without construction of 
the tree. 



Algorithm 1: executeSymbolically 







Input: 


P - program to be executed 








d - set of template detectors (only in □-version) 






Output: E - set of final program states 








T - symbolic execution tree of P (only in *-version) 






Let p be a set of all templates detected in P by detectors d 




2 


s : = 


(0, true, [], entry location of the starting function) 




^ 


Le Q be a queue of program states initially containing only so 




4 


Create 


a root vertex of T labelled with so 




5 


repeat 




6 


Extract the first program state s from Q 




7 


if j 


i.l is the exit location of the starting function or an error location then 




8 




Insert s into E 




9 


else 




10 




S : = 


□ 


11 




p := getTemplatesAt (s.Z,p) 


□ 


12 




if p' ^ then 


□ 


13 




t := chooseTemplate(p') 


□ 


14 




k := getFreshParamO 


□ 


15 




Replace all occurrences of the former parameter in t by k 


□ 


16 




foreach i — 1, . . . , t.n do 


□ 


17 




s' := s o (t.6i,t.ifi,t.Ei,t.li)\K\ 


□ 


18 




Insert s' into 5 


□ 


19 




else /* applying classic symbolic execution step */ 




20 




S := computeClassicSuccessors(P, s) 


* 


21 




Let u be a leaf of T whose label is s 




22 




foreach program state s' G S such that s' .ip is satisfiable do 




23 




Insert s' at the end of Q 


* 


24 




Insert a new vertex v labeled with s' into T 


* 


25 




Insert an edge (u, v) into T 




26 


until ( 


3 becomes empty 




27 


return E 


* 


28 




T 



We now describe the algorithm of classic symbolic execution. At line [21 there 
we create initial program state and then we insert it into a queue Q. The queue 
Q keeps all program states for which we have not been computing successor 
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program states yet. Until Q becomes empty, we iterate the loop at lines [5]- 
E51 At line [7] we detect whether actually processed program state s is final or 
not. If it is not, we compute its successors at line [501 In short, the function 
computeClassicSuccessors either executes actions of out-edges from location 
s.l or it resolves return from a call, if s.l is a function exit location. We already 
gave an intuition how to symbolically execute actions at the beginning of Sec- 
tion^ We further see at line |2"21 that we discard all successors of s, whose path 
conditions are not satisfiable. Discarded states do not represent real behaviour 
of the program. 

Now we focus on *-version of the algorithm. We create root of the tree labelled 
by the initial program state at line |U When processing a state s inside the loop 
we take the only leaf in the tree labelled with s at line [5TJ We compute its 
successor vertices at lines [24] and [25[ Note that the successors are labelled by 
successor states of s. 

We have to postpone description of D-version of the algorithm, until we 
have properly defined templates the algorithm uses. The first step toward the 
definition is introduction of parameters and their substitution. 

We distinguish a set {k, t, Ki, ri, K2;T2j . . .} of variables called parameters, 
ranging over non-negative integers. We extend numeric symbolic expressions 
such that they may also contain application of operators to parameters. We 
allow boolean symbolic expression to contain quantification of parameters. We 
further naturally extend symbolic memories, call stacks and program states to 
contain symbolic expressions with parameters. When we want to emphasise that 
k is a set of all parameters appearing in a symbolic expression tp, we denote it 
as <£>[k]. And similarly for symbolic memories, call stacks and program states. 

We now describe substitution of parameters. Each function from a finite set 
of parameters to non- negative integers is valuation. Let tp [/c], [/«]], H"[kJ and 
s[k] be a symbolic expression, a symbolic memory, a call stack and a program 
state respectively, k ^ and v be a valuation defined for all parameters in k. 
Then we compute tp[u\ from (pfnj such that we substitute all parameters in tp 
by related integers in v. We compute 9\u\ from 6\k\ such that we substitute 
all parameters in all the expressions in 9 by related integers in v. Further, we 
compute E\u\ from E\n\ such that we substitute all parameters in restrictions 
a of all stack records of E by related integers in v. And finally, we compute s\y\ 
form s[k] such that we apply the substitution to its first three components. 

We often use the following simplified notation. If an expression (p contains 
exactly one parameter k and a {(/c, v)} is a valuation, then we write tpfnj and 
tp\v\ instead of <£>[{/«}] and v)}} respectively. The notation also applies 

to symbolic memories, call stacks and program states. 

Next we define composition of program states and equivalence between them. 
We also express some basic equivalences for compositions. 

Definition 2 (Composition) Let E = [ro, ■ • ■ , r m ] and E' = [r' , . . . ,r' n ] be 
call stacks and s = (9,<p,E,l) and s' — {9', <p' , E' , I') be program states. Then 
composite program state s o s' = (9 o 9' , tp A 0{tp'), E o (9 o E'), I'), where 9{tp'} 
is a symbolic expression constructed from tp' such that all symbols a*i in tp' are 
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simultaneously substituted by symbolic expressions 9(0~ 1 (a.i)), 9o9' is a symbolic 
memory such that for each variable a we have (9 o 9')(&) = 0(6' (a)), 9 o E' — 
[tq, . . . , f' n ], where each f[ is equal to ri except the first component being r^.a = 
9or[.a, and E o (0 o E') = [r ,..., r m ,f' ,...,f' n ], . 

Definition 3 (Equivalence) Let ip, ip' be symbolic expressions, 9, 9' be sym- 
bolic memories, E — [ro, . . . ,r m ], E' = [r' , . . . , r' n ] be call stacks and s, s' be 
program states. Then ip = ip' , if ip and p' are either logically equivalent boolean 
symbolic expressions or numeric symbolic expressions such that (p = ip') = true. 
9 = 9', if for each variable a we have 9(a.) = 9'(a). E = E' , if m — n and for 
each i € {0, . . . , m} we have rj.cr and r^.a are defined for the same variables 
with equivalent values and r*ji = r[.l. And s = s' , if both s and s' have equal or 
equivalent components. 

Lemma 1 (Equivalent Compositions) Let s, s' and s" be program states, v 
and v' be valuations of all parameters in s and s' respectively such that uUu' is 
also a valuation, 9, 9' and 9" be symbolic memories, andp> andipAip' be symbolic 
expressions. Then s o (s' o s") e (so s') o s" , s[i/] o s'Ji/J = (s o s')\u U v'\, 

9 o (&' o 9") = (9o 9') o 9", (9 o 0')(<p) = 9(9' (ip)) and 9(ijj) A 0(ip') = 9(ip A ip'). 

Before we formulate a definition of templates with one parameter we give 
its intuition. Let us consider a part of the program P with an entry location e 
and n distinct exit locations x\, . . . , x n . We saw in Section[2l that key properties 
for building a template of the part are program states si[/ej, . . . , s n [«] at exit 
locations xi,...,x n . We need to ensure that states Si[«] correctly represent 
behaviour of the analysed part. King proved [8] that path conditions at leaf 
vertices of symbolic execution tree T of P are satisfiable. Therefore, if Si.ip is 
not satisfiable, then there cannot be a path in T traversing the part form e to Xj. 
The exit Xi is thus useless for the construction of the template and we omit it. 
King further showed [S] that for two different leaf vertices u and v of T we have 
u.ip A v.tp == false. This statement is also valid for program parts. So, we require 
(si.ip A Sj.ip) EE false for all different i and j. We summarise these requirements 
in the following definition. 

Definition 4 (Template with one parameter) Let T be symbolic execution 
tree of P computed by ^-version of Algorithm^ n > be an integer, I, l\, . . . , l n 
be locations in P ', k be a parameter, #i [«],... , n M be symbolic memories 
ipi [«], . • . , i/VsM ^ e satisfiable boolean symbolic expressions such that for each 
i,j € {l,...,n},i 7^ j we have (pi A ipj) = false and let Ei [k] , . . . , S n [k] 
be call stacks. Then a tuple t — (I, n, <pi, E\, h), ■ ■ ■ , (9 n , ip n , E n , In)}) is a 
template with one parameter k in P, if 

(LI) For each path n = uuj in T from any vertex u satisfying u.l = t.l to a leaf, 
there is a vertex w £ U), an index i £ {1, . . . , n} and an integer v > 0, such 
that w.s ee u.s o (t.Oi, t.pi, t.Ei, t.li)\v\. 

(L2) For each vertex u of T , an index i G {1, . . . , n} and non-negative integer v 
such that u.l = t.l and (u.ip A u.9{t.ipi\v\)) is satisfiable, there is a successor 
w of u in T such that w.s = u.s o (t.Oi, t.ipi, t.Ei, t.li)\v\. 
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Note that requirement (LI) guarantees that no path in T with vertices u and 
v such that u.l = I and v .1 — U is suppressed by the state (t.9i, t.ipi, t.Si, i.Z$)[ttJ. 
And requirement (L2) guarantees that program state (t.8i, t.ipi, t.Si, t-2»)M 
does not produce unreal paths. 

We are ready to describe □- version at Algorithm [T] At line Q] there we detect 
templates with one parameter in the passed program P. That is a task for so 
called template detectors. We discuss a possible construction of such a detector 
in Section SJ The only purpose of lines ITTHT51 is to compute successor states of a 
currently processed program state s. At line II II there we call a system function 
getTemplatesAt, which selects those templates, whose entry locations matches 
the actual program location s.l. If the selection is not empty we may instantiate 
one of the selected templates. A system function chooseTemplate is supposed 
to choose exactly one template t to be instantiated. We may for example choose 
randomly. We do not put any constraints to the selection strategy. To prevent 
parameter collisions with already instantiated templates, we first get a fresh 
parameter at line [14] and then we replace the parameter used in t by default by 
the fresh one. Then we get to a loop at line [THl For each program state in the 
template we create one its successor program state (see line ITT]) . 

We finish this section by formulating soundness and completeness theorems 
for compact symbolic execution using templates with one parameter. They say, 
that both classic and compact symbolic execution explore the same set of real 
paths of P. We assume that T and T' are symbolic execution trees of the program 
P computed by *- and □, ^-versions of Algorithm [JJ respectively. 

Theorem 1 (Soundness) For each leaf vertex e € T there is a leaf vertex 
e' G T" and a valuation v of all parameters in e'.s such that e.s = e'.sfuj. 

Theorem 2 (Completeness) For each leaf vertex e' G T" there is a leaf vertex 
e G T and a valuation v of all parameters in e'.s such that e.s = e'.s^yj. 

Because of space limitations we omit proofs of the theorems. Nevertheless, 
an interested reader may find the proofs in our technical report [13] . 

4 Computation of Templates 

In this section we present an algorithm computing templates with one parameter 
for program parts consisting of a cyclic path with specified a single entry location 
and several exit ones. 

Let P be a program and let us suppose we have a program part of P with 
a cyclic path, an entry location e and some exit location x. We show how to 
compute a symbolic memory ^[acJ, a path condition ^ x [k] and a call stack 
S^Jk] at the exit location x. The computation of remaining parts of resulting 
template are then straightforward. 

The algorithm proceeds in two steps. First, we compute a program state 
(9, <p, Q,e) resulting from classic symbolic execution of the cyclic path of the 
part exactly once, and a program state (9, if, S, x) resulting from classic symbolic 
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execution of a path from e to i. The second step is to express ^[k], and 
^[k] in terms of the program states computed in the first step. 

The computation of program states (8, </?,[], e) and (0,<p,S,x) requites to 
run classic symbolic execution on the analysed program part. But Algorithm [TJ 
can only execute programs satisfying Definition [T] Therefore, we create a new 
program, say P', representing the analysed part. 

We start with a program P' consisting of all variables of P and of all those 
functions of P having at least one location of the cycle. Note that the cyclic path 
of the part may traverse several functions through call sites. We now remove all 
the locations and edges in P' , which do not belong to the cycle nor to the path 
from e to x. We assume that x does not belong to the cyclic path, since otherwise 
we can always create its copy outside the cycle. Next we mark the function in P' 
containing the entry location e as the starting function of P' and we set e to be 
the entry location of the function. Then we create a new location e' representing 
the exit location from the starting function. Now we break the cyclic path in the 
entry e such that we redirect the only in-edge of e (belonging to the cycle) to 
e'. And finally we transform x to error location by adding loop edge with skip 
action. 

P' is now a program according to Definition [TJ So, we can run unmarked 
version of Algorithm [TJ Note that the algorithm must always terminate for P'. 
Let E be a set of resulting program states. Then \E\ < 2. If there is no s s E such 
that s.l — e, then we do not create the template for the part, since there is no 
real path around the cycle. If there is no state s £ E such that s.l — x, then we 
discard the exit x from the consideration for the template, since it is impossible 
to leave the loop through x. Otherwise, E contains exactly two program states, 
which are the states we are looking for. 

Now we show how to express 9 X [k] , ip x [k] and S x [k] in terms of the program 
states computed above. Let T be a symbolic execution tree of P, computed by 
♦-version of Algorithm [TJ Further, let u be a vertex of T such that u.l — e and 
7T = u . . . u\ . . . U2 ■ ■ ■ u v . . . w be a path in T starting at it, iterating the cycle of 
the part exactly v > times, i.e. all the vertices Ui have Ui.l = e, and then tt 
leaves the cycle into the vertex w with w.l — x. We use memory composition to 
express memories of vertices along tt as follows. 

u-l.9 = u.6o0 

u 2 .8 = Ul .9 o 9 = u.6 o (6 o 0) 

u v .8 = u v -\.8 o 9 — u.9 o ( 9 o ■ ■ ■ o 9 ) . 

V 

If we denote the composition of i symbolic memories 9 by 9 l , where 9° = and 
9 1 = 9, then we have Ui.9 = u.9 o 9 % and we get 

w.6 = u.9 o (9 U o 0). 
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We proceed similarly to express path conditions of vertices along ir. 
u\.ip = u.tp A u.d(ip) = u.ip A {u.9 o 8°) ((f) = u.tp A u.9(0°(ip)) 

u 2 .tp = ui.ip A u x .9{ip) = u.ip A u.9{6°(<p)) A {u.6 o 9 1 ){ip) = u.ip A u.8{6 a {tp) A fl 1 ^)) 
u u .cp = u u -i.ip A u v -i-6{t-p) = u.ip A u.9(9°(tp) A ... A 9 v ~ 1 {ip)) 



Using the following equivalence 

9°(tp) A ... A 6 v ~ l {<p) = 0<u AVr (0 < r < ^ -y 6» T (^}), 
we can write 

lu.yj = A u v .9{(p) = u.p A u.9{9°{ip) A ... A 9"~ 1 (ip) A fl 1 ^)) 
= u.y> Au.6»(0 <z/AVt(0<t<z/->- 9 T (ip)) A 9" (ip)) . 

SMT solvers do not support memory composition operation appearing in the 
formula w.(p. Therefore, we need an equivalent declarative description of the 
operation. Such a description is a parametrised symbolic memory #[/cJ, where 
we require = 9 K , for any re > 0. For a given symbolic memory we compute 
content of #[re] per variable by applying the following two rules 

0(a) = 0(a) + c, a is of a numeric type, c is a numeric constant of a's type 
6»[reJ(a) = 0(a) + c ■ typeOf <a>(re) ' 

9{A) = 0(A), A is of a an array type 
%](A) = 0(A) ' 

where expression typeOf<a>(re) represent casting operation of re to a type of 
variable a. If there is a variable, which does not match any of the rules, then 
we fail to compute 0|re]. And we thus fail to compute the template. Obviously, 
one can provide more rules for more complex symbolic memories. The presented 
rules are only supposed to illustrate the process. 
Having 6*[reJ we define 

944 = 914 ° 

fxM = < re A Vt (0 < r < re -» [7-] (</?)) A0[re](<£} 

and we get w.0 = u.9 o ^[^J, w.<^ = A ^.^(^[i/J) and w.S 1 = u.S o (u.0 o 
SxH). Using these equivalences we write w.s = u.s o (ff a; ,^j : ,S' I ,a;)[i'] ) which 
is exactly the equivalence used in Definition |4] (LI) and (L2). 
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5 Discussion 



We presented compact symbolic execution using only templates with a single 
parameter. We further restrict ourselves to computation of templates only for 
program parts consisting of cyclic paths. We can get even better reduction of 
size of symbolic execution tree, if we create templates for more complex program 
parts, and when we use more parameters. Let us consider the function countlf 
at Figure [5] The program loop in the function consists of two cyclic paths around 
it. We have already discussed templates for both cycles in Section [2j But if we 
built a single template using two parameters (one parameter per cyclic path), 
then resulting compact symbolic execution tree would be finite. Further, we have 
not considered a possible relations between different instances of some templates. 
A good example is recursion. We first have a sequence of recursive calls and later 
we return from them. If we create a template for the phase of recursive calls and 
the second one for the phase of the returning, then instances of these templates 
must appear in pairs along paths in symbolic execution tree. (Our technical 
report [13] provides a definition of templates for recursive functions). We see, 
there is a space for extensions of the basic concepts we presented here. 

Let us consider well known algorithm binarySearch. Template detection for 
this program (even with a single parameter) may infer geometric progressions 
as values of some variables. They may later cause serious performance issues for 
SMT solver, when they get into a path condition. 

Compact symbolic execution commonly has higher performance requirements 
to SMT solvers then classic one. Path conditions may contain template param- 
eters besides symbols. And parameters are quantified. This is the price of the 
ability to reason about multiple program paths at once. 

King showed effectiveness of symbolic execution for automated testing gen- 
eration [8]. Producing a good test typically means to reach some interesting 
(e.g. bug suspicious) program location. Compact symbolic execution can be very 
helpful in this task. Let us consider a situation, when reachability of such a tar- 
get location is dependant on an exact number of iterations of a particular cycle. 
Providing a template for a program part with the cycle, we can simultaneously 
reason about all the paths exiting from the cycle. Therefore, instead of explo- 
ration of paths space by classic symbolic execution, we can just send a query to 
SMT solver to check satisfiability of parametrised path condition. 

King also showed in his paper jS], how symbolic execution can be used in 
proving program correctness according to Floyd's method [3]. Using templates 
we can decrease or in some cases even eliminate the need of loop invariants. 
For programs, where compact symbolic execution is finite in contrast to classic 
one, there we do not need loop invariants at all. And for other programs, loop 
templates describe behaviour of some paths through loop, and we may therefore 
provide simpler invariants for the remaining behaviour of the loop. 
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6 Related Work 



Compact symbolic execution is tightly related to the work of King in 1976 [8], 
where the author introduced the general concept of classic symbolic execution. 
Besides the description of symbolic execution King discussed its applicabil- 
ity to program testing and formal proving of correctness according to Floyd's 
method [3]. Nevertheless, issues like the path explosion problem were not tackled. 

In [5] authors propose a program instrumentation by a code providing lazy 
initialisation of dynamically allocated data structures like lists or trees and they 
enable symbolic execution of the instrumented program by a standard model 
checker without building a dedicated tool. The lazy initialisation algorithm is 
further improved and formally defined as an operational semantics of a core 
subset of the Java Virtual Machine in [5] . 

A scalability of symbolic execution to real world programs can be improved by 
exploring only client's code [7J. A library code (like string manipulation, standard 
containers like sets or maps) can be assumed as well defined and properly tested. 

There are several symbolic execution based techniques constructing loop sum- 
maries or simply counting loop iterations |5I11I12] . The introduction of counters 
usually provides a possibility to speak about multiple paths through loop at once. 
A technique presented in [5] analyses loops on-the-fly, i.e. during simultaneous 
concrete and symbolic execution of a program for a concrete input. The loop 
analysis infers inductive variables, i.e. variables that are modified by a constant 
value in each loop iteration. These variables are used to build loop summaries 
expressed in a form of pre and postconditions. The LESE technique presented 
in |llj introduces symbolic variables for the number of times each loop was ex- 
ecuted. LESE links the symbolic variables with features of a known grammar 
generating inputs. Using these links, the grammar can control the numbers of 
loop iterations performed on a generated input. A symbolic-execution-based al- 
gorithm in [12] produces a nontrivial necessary condition on input values to drive 
the program execution to the given location. The key part of the technique is 
computation of loop summaries in form of symbolic program states and path 
conditions both parametrised by so called path counters. Each path counter is 
assigned to individual path through the analysed loop. 

There are also approaches computing function summaries [4|lj . Reusing sum- 
maries at call sites typically leads to an interesting performance improvement. 
Moreover, summaries may insert additional symbolic values into a path condition 
which often leads to another performance improvement. 

Finally, there are also techniques partitioning program paths into separate 
classes according to impact of the paths to a given set of program variables [9110] . 
Values of output variables are typically considered as a partitioning criteria. 

7 Conclusion 

We introduced a generalisation of classic symbolic execution called compact sym- 
bolic execution. We generalised notion of symbols of classic symbolic execution 
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such that symbols can be related to different program locations now. This al- 
lows us to analyse individual parts of a given program separately from the rest of 
the program. We further introduced concept of templates representing declara- 
tive parametric descriptions of behaviour of separately analysed program parts. 
We gave precise definition of templates with one parameter and we provided 
algorithm of compact symbolic execution using these templates. 
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